本文源码请见我的GitHub
1 | import pandas as pd |
3.4.1 通用函数:保留索引
1 | rng = np.random.RandomState(42) |
0 6
1 3
2 7
3 4
dtype: int32
1 | df =pd.DataFrame(rng.randint(0,10,(3,4)), |
1 | df |
2000 | 2001 | |
---|---|---|
California | 165416 | 213884 |
New York | 665468 | 598949 |
A | B | C | D | |
---|---|---|---|---|
0 | 6 | 9 | 2 | 6 |
1 | 7 | 4 | 3 | 7 |
2 | 7 | 2 | 5 | 4 |
如果两个对象使用Numpy通用函数, 生成的结果是另一个保留索引的Pandas对象
1 | np.exp(ser) |
0 403.428793
1 20.085537
2 1096.633158
3 54.598150
dtype: float64
1 | np.sin(df * np.pi / 4) |
A | B | C | D | |
---|---|---|---|---|
0 | -1.000000 | 7.071068e-01 | 1.000000 | -1.000000e+00 |
1 | -0.707107 | 1.224647e-16 | 0.707107 | -7.071068e-01 |
2 | -0.707107 | 1.000000e+00 | -0.707107 | 1.224647e-16 |
通用函数:索引对齐
这个主要是用在处理二元计算时对齐数据的。
1.Series索引对齐
1 | area = pd.Series({'Alska':1723, 'Texas': 6871, 'California': 4235}, name = 'area') |
Alska 1723
Texas 6871
California 4235
Name: area, dtype: int64
California 1456434
Texas 654687
New York 4565732
Name: population, dtype: int64
1 | population / area |
Alska NaN
California 343.904132
New York NaN
Texas 95.282637
dtype: float64
结果是两个输入数组索引的并集, 缺失位置用NaN填充;NaN值还不是想要的结果可以设置参数自定义A或B的缺省值
1 | A = pd.Series([2,4,6], index= [0,1,3]) |
0 2
1 4
3 6
dtype: int64
0 1
1 5
2 8
dtype: int64
1 | A + B |
0 3.0
1 9.0
2 NaN
3 NaN
dtype: float64
1 | #这里就可以自定应缺省填充规则 |
0 3.0
1 9.0
2 8.0
3 6.0
dtype: float64
2.DataFrame对齐
1 | C =pd.DataFrame(rng.randint(0, 20, (2,2)), |
C | D | |
---|---|---|
0 | 9 | 15 |
1 | 14 | 14 |
1 | D =pd.DataFrame(rng.randint(0, 10, (3,3)), |
D | E | C | |
---|---|---|---|
0 | 2 | 6 | 3 |
1 | 8 | 2 | 4 |
2 | 2 | 6 | 4 |
1 | C + D |
C | D | E | |
---|---|---|---|
0 | 12.0 | 17.0 | NaN |
1 | 18.0 | 22.0 | NaN |
2 | NaN | NaN | NaN |
1 | #同样也可以使用fill_value参数自定义缺省值 |
1 | fill = C.stack().mean() |
C | D | E | |
---|---|---|---|
0 | 12.0 | 17.0 | 19.0 |
1 | 18.0 | 22.0 | 15.0 |
2 | 17.0 | 15.0 | 19.0 |
3.4.3 通用函数:DataFrame与Series计算
1 | A = rng.randint(10 ,size = (3, 4)) |
1 | A |
array([[8, 6, 1, 3],
[8, 1, 9, 8],
[9, 4, 1, 3]])
1 | A-A[0] |
array([[ 0, 0, 0, 0],
[ 0, -5, 8, 5],
[ 1, -2, 0, 0]])
1 | df = pd.DataFrame(A, columns=list('QWER')) |
Q | W | E | R | |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
1 | 0 | -5 | 8 | 5 |
2 | 1 | -2 | 0 | 0 |
1 | df['R'] |
0 3
1 8
2 3
Name: R, dtype: int32